Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prom/scrape): recreate internal scraper after reloaded #5541

Closed

Conversation

hainenber
Copy link
Contributor

@hainenber hainenber commented Oct 19, 2023

PR Description

Recreate internal scrapeManager of prometheus.scrape Flow component and hence multiple options will be applied once getting reload signal

Which issue(s) this PR fixes

Fixes grafana/alloy#329

Notes to the Reviewer

Let me know if a unit test is required or current setup is enough to ensure no regression since this seems pretty convoluted to write a straightforward test.

PR Checklist

  • CHANGELOG.md updated
  • [] Documentation added
  • Tests updated
  • Config converters updated

@hainenber hainenber requested a review from a team as a code owner October 19, 2023 17:13
@hainenber hainenber force-pushed the allow-reload-internal-prom-scraper branch from f2d22dc to 53e2238 Compare October 19, 2023 17:14
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is heading in the right direction but needs some more work before it's mergeable.

Can you write a unit test for this? It would be helpful for us to validate the fix with automation instead of reviewing by eye.

Comment on lines +164 to +172
// Update created component with prometheus.Fanout and prometheus.ScrapeManager
data, err = o.GetServiceData(http.ServiceName)
if err != nil {
return nil, fmt.Errorf("failed to get information about HTTP server: %w", err)
}
httpData := data.(http.Data)
flowAppendable, scraper := c.createPromScrapeResources(httpData, args)
c.appendable = flowAppendable
c.scraper = scraper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary, if the same code will run in Update on line 175?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to retain the previous setup. Now you're saying, it does look extra. Let see if the upcoming unit test can invalidate this step and I'll remove it

}
newFlowAppendables, newScraper := c.createPromScrapeResources(data.(http.Data), newArgs)
c.appendable = newFlowAppendables
c.scraper = newScraper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue here: the Run method runs the original c.scraper, and will never terminate that scraper/run the new one that's created. This means the changes here on subsequent updates will never truly apply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally missed this one out 🫣

But I do think we can rewrite the scraper's Run goroutine into a for-select loop to handle scraper restart.

Let me try out my luck on this.

return fmt.Errorf("failed to get information about HTTP server: %w", err)
}
newFlowAppendables, newScraper := c.createPromScrapeResources(data.(http.Data), newArgs)
c.appendable = newFlowAppendables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to recreate the appendables; the call to UpdateChildren on line 240 should be sufficient, and we can return to having only one instance of c.appendable for the lifetime of the component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, -1 work for me then 😄

@hainenber
Copy link
Contributor Author

Alrighty, I'll try writing one 😄

@rfratto
Copy link
Member

rfratto commented Oct 19, 2023

@hainenber You might find the componenttest package useful here to test components (Look for other unit tests which use it to get examples).

This will handle running/updating the component for you so you can focus your unit test on creating a fake scrape target and determining whether protobuf negotiation is actually enabled. You'll know protobuf negotation is being used by looking at the Accept header sent to scrape targets; if it includes vnd.google.protobuf, then protobuf negotiation is enabled.

@hainenber
Copy link
Contributor Author

Hi there, I've found making the goroutine in Run() function reloadable pretty tough and think it's much better to fix this on upstream, namely prometheus/prometheus.

I've made a PR to address it. Once the patch got merged, hopefully someone can cherry-pick it into a new grafana/prometheus 😄

For now, this PR will be closed in anticipation.

@hainenber hainenber closed this Oct 22, 2023
@hainenber
Copy link
Contributor Author

Turns out the enable_protobuf_negotiation option has been deprecated in latest prometheus.

The new option, scrape_protocol, can be specified in scrape_configs and normal agent reload will help fixing original issue.

My PR on prometheus/prometheus is still opened to allow reloading the manager with less diffused options, such as extra_metrics

@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enabling enable_protobuf_negotiation in prometheus.scrape component requires restart
2 participants